Improving the performance of the matrix inversion on a Tesla GPU

نویسندگان

  • Pablo Ezzatti
  • Enrique S. Quintana-Ortí
  • Alfredo Remón
چکیده

We study two different techniques for the computation of a matrix inverse, the traditional approach based on Gaussian factorization and the Gauss-Jordan elimination alternative more suitable for parallel architectures. The target architecture is a current general-purpose multi-core processor (CPU) connected to a graphics processor (GPU). Parallelism is obtained from the use of libraries MKL (for the CPU) and CUBLAS (for the GPU), as well as, performing simultaneously operations in both architectures. Numerical experiments performed on a system equipped with two Intel QuadCore processors and a Tesla C1060 GPU, illustrate the efficiency attained by the Gauss-Jordan elimination implementation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function

We investigate the performance of two approaches for matrix inversion based on Gaussian (LU factorization) and Gauss-Jordan eliminations. The target architecture is a current general-purpose multicore processor connected to a graphics processor (GPU). Parallelism is extracted in both processors by linking sequential versions of the codes with multi-threaded implementations of BLAS. Our results ...

متن کامل

A Direct Matrix Inversion-Less Analysis for Distribution System Power Flow Considering Distributed Generation

This paper presents a new direct matrix inversion-less analysis for radial distribution systems (RDSs). The method can successfully deal with weakly meshed distribution systems. (WMDSs). Being easy to implement, direct methods (DMs) provide an excellent performance. Matrix inversion is the mean reason of divergence and low-efficiency in power flow algorithms. In this paper, the performance of t...

متن کامل

High-Performance Matrix-Vector Multiplication on the GPU

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrixvector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-...

متن کامل

A Parallel Algebraic Multigrid Solver on Graphics Processing Units

The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the manycore GPU architecture. A performance comparison of the parallel sol...

متن کامل

Finite Element Matrix Generation on a Gpu

This paper presents an efficient technique for fast generation of sparse systems of linear equations arising in computational electromagnetics in a finite element method using higher order elements. The proposed approach employs a graphics processing unit (GPU) for both numerical integration and matrix assembly. The performance results obtained on a test platform consisting of a Fermi GPU (1x T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010